IN THE CLAIMS 
Please amend the claims as indicated; 

1-14. (canceled) 

1 5 . (currently amended) A method comprising: 

reading an HTML document of a web page as an analyzing object; 
conducting a temporary block analysis based on a description of HTML tags of the 
HTML document; 

using the HTML tags to temporarily divide the HTML document into blocks; 

identifying unnecessary information elements in the HTML document, wherein the 
unnecessary information elements include: 

plural information elements that include an OBJECT_IMAGE having a same 
Uniform Resource Locator (URL), wherein the OBJECT_IMAGE describes a 
type of media used to display the HTML document, 

a block of t ext in the HTML document that is shorter than a maximum 
predetermined length, and wherein the block of text appears in the HTML 
docxament more than a predetermined frequency, 
multiple anchors having a same title, 

image tags that only perform a role of punctuation for text in the HTML 
document, and 

multiple text blocks having a same description; 

defibaing any block in the HTML document that is deemed to be meaningless as an 
OBJECT_DELIMITER, wherein a block is deemed to be meaningless if that block contains only 
said u nnecessary information elements and at least one anchor; and 

crawling only anchors foimd in blocks that have not been defined as 
OBJECT^DELIMITERs. 

16. (previously presented) The method of claim 15, wherein the maximum predetermined 
length is 12 bytes. 
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17. (currently amended) The method of claim 16, wherein the [the] predetermined 
frequency is ten times. 

18-20. (canceled) 

21. (currently amended) A computer-readable medium encoded with a computer program, 
wherein the computer program, when executed, performs the steps of: 

reading an HTML document of a web page as an analyzing object; 
conducting a temporary block analysis based on a description of HTML tags of the 
HTML document; 

using the HTML tags to temporarily divide the HTML document into blocks; 
identifying unnecessary information elements in the HTML document, wherein the 
unnecessary information elements include: 

plural information elements that include an OBJECT_IMAGE having a same 
Uniform Resource Locator (URL), wherem the OBJECT_IMAGE describes a 
type of media used to display the HTML document, 

a block of t ext in the HTML document that is shorter than a maximum 
predetermined length, and wherein the block of text appears in the HTML 
document more than a predetermined frequency, 
multiple anchors having a same title, 

image tags that perform a role of punctuation for text in the HTML document, and 
multiple text blocks having a same description; 
defining any block in the HTML document that is deemed to be meaningless as an 
OBJECT_DELIMITER, wherein a block is deemed to be meaningless if that block contains only 

said u nnecessary information elements; and 

crawling only anchors found in blocks that have not been defmed as 
OBJECT^DELIMITERs. 

22. (previously presented) The computer-readable medium of claim 21, wherein . the 
maximum predetermined length is 12 bytes. 
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23. (previously presented) The computer-readable medium of claim 21, wherein the 
predetermined frequency is ten times. 

24. (currently amended) A method comprising: 

dividing an HTML document into blocks; 

identifying unnecessary information elements in the HTML document, wherein the 
unnecessary information elements include: 

a block of t ext in the HTML document that is shorter than a maximum 
predeteraiined length, and wherein the block of text appears in the HTML 
document more than a predetermined frequency, 
multiple anchors having a same title, 

image tags that only perform a role of punctuation for text in the HTML 
document, and 

multiple text blocks having a same description; 

defining any block in the HTML document that is deemed to be meaningless, wherein a 
block is deemed to be meaningless if that block contains only the umecessary information 
elements and at least one anchor; and 

crawling only anchors found in blocks that have not been deemed meaningless [[for]] due 
to containing only ^ umecessary information elements. 
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